1.2 Introduction to Statistics

1 Statistical Models

Statistical Model

A statistical model is a family P of candidate distributions for data X. We assume XP for some PP, but don't know which X yields evidence about which P.

1.1 Parametric vs Nonparametric Models

Parametric models are distributions indexed by θΘ. So P={Pθ|θΘ},ΘRd. (d is called model dimension)
Denote Pθ(),Eθ(), also indexed by θ.
On the contrary, non-parametric models means no natural way to parameterize P by real vector.

Simple case for non-parametric models is X1,,Xni.i.dP, P is ANY distribution on R. Then P={Pn|P is a distribution on R}, X=(X1,,Xn)Pn.

However, we can use "parametric notation" P={Pθ|θΘ} WLOG. (we can always denote θ=P,Θ=P)

1.2 Bayesian vs Frequentist Inference

So far we assume data X follows a distribution Pθ for parameter θ we want to determine. Sometimes we will add the Bayes assumption that θ itself is random, drawn from a known distribution Λ we call prior.
This helps reduce the problem of inference, and we can focus on the conditional distribution of θ|X.
However, before introducing Bayesian Inference, we assume θ as a fixed value in Θ.

2 Estimation

We want to determine the value of a parameter in a parametric model.

Loss Function, Risk Function

Loss function L(θ;d) is the disutility of guess g(θ)=d. Typically non-negative, and L(θ,g(θ))=0.

E.g., square error loss L(θ,d)=(g(θ)d)2.

Risk function is the expected loss of an estimator: R(θ;δ())=Eθ[L(θ,δ(X))].

For square error loss, it is called mean square error (MSE): MSE(θ;δ())=Eθ[(δ(X)g(θ))2].

In brief, we have two primary strategies to choose an estimator:

  1. Summarize the risk function by a scalar.
  2. Restrict attention to a smaller class of estimators.

2.1 Comparing Estimators

Inadmissible, Strictly Dominate

An estimator δ is inadmissible if δ, with

  • R(θ;δ)R(θ;δ),θ;
  • R(θ;δ)<R(θ;δ),θ.

We say δ strictly dominates δ.

2.2 Resolving Ambiguity

There is no estimator that uniformly attains the smallest risk among all estimators. Like the brute δ(X)12, it performs the best when θ is actually 12. There are two main strategies to resolve this ambiguity.

Summarize the risk function by a scalar.

Restrict the choice of estimators
Unbiased estimation: we can demand an estimator to satisfy Eθ[δ0(X)]=g(θ),θΘ.
Under unbiasedness, we can clearly define the optimal estimator called UMVU estimator. From the above example, δ0(X)=Xn is actually the UMVU estimator.